Theoretical 
Computer Science 


a ee Bee 
ELSEVIER Theoretical Computer Science 231 (2000) 5-15 


www.elsevier.com/locate/tcs 


LANGAGE: A maple package for automaton 
characterization of regular languages 


Pascal Caron 


Laboratoire d'Informatique Fondamentale et Appliquée de Rouen, Université de Rouen, 
76821 Mont-Saint Aignan, Cedex, France 


Abstract 


LANGAGE is a set of procedures for deciding whether or not a language given by its mini- 
mal automaton is piecewise testable, locally testable, strictly locally testable, or strongly locally 
testable. New polynomial algorithms are implemented for the two last properties. This package is 
written using the symbolic computation system Maple. It works with AG, a set of Maple pack- 
ages for processing automata and finite semigroups. © 2000 Elsevier Science B.V. All rights 
reserved. 
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1. Introduction 


Automata and regular languages theory is at the heart of theoretical computer science. 
In the past 30 years a lot of research work has been devoted to classifying regular 
languages. Schiitzenberger has been a precursor in this domain, especially with his 
results on star-free events [9]. Several subclasses of star-free languages have been 
characterized through properties of their semigroup. It is the case for piecewise testable 
languages (PT) studied by Simon [10], locally testable languages (LT) investigated by 
both McNaughton [7] and by Brzozowski and Simon [3], and for strongly locally 
testable languages (SLT) introduced by Beauquier and Pin [2]. From an algorithmic 
point of view, these algebraic characterizations lead to procedures with a high time 
complexity, since computing the syntaxic semigroup of a language is exponential on 
the number of states of its minimal automaton. 

The tests implemented in the LANGAGE package are based on properties of the 
minimal automaton; time complexity is polynomial. The characterization of piecewise 
testable automata is due to Simon [10], and the related algorithm to Stern [11]. For 
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locally testable automata, we implement the algorithm of Kim et al. [6]. The algorithms 
dealing with strictly locally testable automata and strongly locally testable automata 
are deduced from recent characterizations given by the author in [4]. This new Maple 
package is a contribution to the development of the programming project Automate. ! 
The aim of this project is to provide a symbolic computation system on automata, 
languages, semigroups and words. This paper contains four sections including this one. 
Section 2 presents the needful theory to understand algorithms. Section 3 is dedicated 
to algorithms. The last section allows us to conclude. 

Most proofs in this paper are just sketches. The complete proof can be found in the 
full version [4]. 


2. Theoretical background 


In this section, for each family of languages one can study using the LANGAGE 
package procedures, we provide a formal definition, as well as a characterization based 
on automata properties. 

Let us first recall that a finite state automaton M is a 5-tuple (2, Q,i,F,6) where 
X is a finite alphabet, Q is a finite set of states, ¿€ Q is the initial state, F CQ is 
the set of terminal states, 6:Q x X— Q is the transition function. We shall use the 
term connected component (CC) to refer to any subgraph whose underlying undirected 
graph is connected. By SCC we mean a strongly connected component. An SCC Ci 
is an ancestor (resp. descendant) of an SCC C) if there exists a path from Cı to C2 
(resp. from C> to C1). We will use a classical algorithm described in [1] to find all 
SCCs of a state transition graph. 

Three procedures of the LANGAGE package are devoted to the study of local testa- 
bility. This notion is classically illustrated [2] by considering a window of small size, 
which is moved along the input word, so that information may be logged from strings 
appearing in the window, without care of their number nor of their order. Different 
kinds of local testability can be described by means of variants of this mechanism. 


2.1. Strictly locally testable languages (sLT) 


For strictly local testability, a window of size k scans the input word in order to 
verify that its prefix of length k is a good one, all interior factors of length k are good 
ones and its suffix of length k is a good one. More precisely, we will use the definition 
given by McNaughton and Papert in [8]. 


Definition 2.1. Let k be a positive integer. For w € X* of length >k, let L,(w), Ry(w) 
and J,(w) be respectively the prefix of length k, the suffix of length k and the set 
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of interior factors of length k of the word w. LC >” is strictly k-testable if and 
only if there exist three sets X, Y,Z of words on X such that for all w€ X*, |w| >k, 
weL iff p(w) EX, Ry(w) EY and I,(w) CZ. A language is strictly locally testable 
if it is strictly k-testable for some k > 0. 


Definition 2.2 (Local automaton). Let M =(2,0,i,F,6) be a deterministic automaton 
and let k be a positive integer. .W is k-local if, Yw € 2“, the set {q.w |q € Q} contains 
at most one element. An automaton is local if it is k-local for some k > 0. 


Theorem 2.3. A language is strictly locally testable iff its trim minimal automaton 
is local. 


Proof. For the only if part of the statement, we will provide a reductio ad absurdum. 
Consider a strictly locally testable language L and a word w (long enough) leading to 
two different states (say q and q’) on the minimal automaton of L. The strict locality 
implies that all successful path which contains w as factor have the same suffixes. By 
minimality it implies that q =q’. The if part is shown by induction on the length of 
the words of the language. 


Definition 2.4 (s-/ocal, pairwise s-local). Let M =(2,0,i,F,6) be an automaton. 

1. A strongly connected component (SCC) C of the state transition graph of M is 
s-local if and only if there do not exist two distinct states p and q in C and a word 
w in X* such that 6(p,w)= p and 6(q,w)=q. 

2. Let Cı and C, be two disctinct SCCs; then Cı and C) are pairwise s-local iff there 
do not exist two distinct states p and q respectively in Cı and C} and a word 
wé2* such that 6(p,w)= p and 6(q,w)=q. 


The algorithm we have implemented for testing whether a language is strictly locally 
testable or not is deduced from the following theorem. 


Theorem 2.5. A language is strictly locally testable iff the state transition graph of 
its trim minimal automaton has the following properties: 

1. All SCCs of the state transition graph are s-local. 

2. All SCCs of the state transition graph are pairwise s-local. 


Proof. If there exist a word w and two states p and q such that 6(p,w)= p and 
ô(q,w)=q then for all w long enough, the set {g- w|w¢€Q} has more than one el- 
ement. For the converse, we consider two distinct paths of length >n?, where n is 
the number of states of the minimal automaton with the same label and we show that 
there exists one pair of states encountered twice. 


2.2. Locally testable languages 


A locally testable language L is a language with the property that, for an integer k, 
whether or not a word u is in the language depends (1) on the prefix and suffix of 
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the word u of length k — 1 and (2) on the set of intermediate substrings of length k 
of the word u. A more formal definition is proposed by Zalcstein in [12]. 


Definition 2.6 (LT). A k-testable language is a boolean combination of strictly k- 
testable languages. A language is locally testable if it is k-testable for some k > 0. 


Kim et al. [6] describe a polynomial algorithm making it possible to decide whether 
a minimal deterministic automaton recognizes a locally testable language or not. We 
list here some definitions and theorems on which this algorithm is based. 


Definition 2.7 (Transition span). Let M =(,Q,i,F,6) be an automaton and let C be 
a connected component of the state transition graph of .@. The transition span in C 
of a state p€Q is the set TS(C, p)={x,x € Z* | for every prefix w of x, d(p,w) is 
in C}. 


Definition 2.8 (TS-equivalence). Let C be a CC of the state transition graph of an 
automaton. States p and q are TS-equivalent in C iff TS(C, p)=TS(C,q). 


Definition 2.9 (TS-/ocal). Let C; be an SCC of the state transition graph of a deter- 

ministic automaton. The graph is TS-local w.r.t. C; if and only if it has the following 

property, for every SCC C; which is an ancestor of C;, either 

e C; and C; are pairwise s-local, or otherwise 

e there exists a pair of states p and q respectively in C; and C; such that p and q 
are TS-equivalent in C;; (the reaching component from C; to C;). 

The state transition graph of an automaton is TS-local if and only if for every SCC 

C; of the graph, it is TS-local w.r.t. Cj. 


Theorem 2.10 (Characterization). A minimal deterministic automaton is locally 
testable if and only if it satisfies the following conditions: 

1. All SCCs of the state transition graph are s-local. 

2. The state transition graph is TS-local. 


2.3. Strongly locally testable languages 


The notion of strongly locally testable languages is a variation of the notion of 
locally testable languages. Only substrings of length k of a word u are needed to know 
whether or not this word belongs to a strongly locally testable language. The following 
definition is quite usual [2]. 


Definition 2.11. L is strongly locally testable if it is made up with a finite boolean 
combination of languages of the form Y*wX* where we d*. 


Definition 2.12. Let W=(2,0,i,F,6) be an automaton and C be an SCC of the state 
transition graph of M. Let L% be the language relative to the strongly connected 
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component C and to the positive integer k: 


LE = {u| |u|>k and 3x €C such that ô(x,u) € C}. 


The prefix language of the SCC C denoted by Pc is defined as follows: 


Po ={u| sve &* such that d(i,u- pv) € C}. 


Theorem 2.13. Let L be a language and M =(,0,i,F,6) be its minimal automaton. 

L is strongly locally testable iff the following conditions are verified: 

1. L is locally testable. 

2. For every SCC C, Vp,qEC, pEF &qEF. 

3. For every SCC C of the state transition graph of M, 3k such that LE. C Pe or LEN 
Po=O. 


Proof. The proof of Theorem 2.13 is in three steps. First we prove that (3) is equivalent 
to say that two words x, y having their image in the same Y-class implies that these 
words are labels of two paths leading to the same SCC. Then we show that a strongly 
locally testable language has this last property. And last we prove by (1), (2) and (3) 
that the Y-classes of the syntaxic semigroup are saturated by the language (i.e. the 
language is strongly locally testable). See [2]. 


In the following, we will write Lc for L% whenever it is not ambiguous. 


2.4. Piecewise testable languages 


Definition 2.14. A language is piecewise testable if it is a boolean combination of 
languages of the form 2*a,2*ay...X*a,X* where a;€ L,i=1,...,n. 


The minimal automaton of a piecewise testable language has been characterized by 
Simon [10]. In order to state Simon’s result we need some additional definitions. 

Let G=(Q,6) be the state transition graph of an automaton æ, and I’ a subset 
of X. The state transition graph of æM on I is the graph Gr=(Q,dr) such that 
or ={(x,a, y)€6|aeT}. Recall that a graph G is acyclic if all its SCCs have only 
one element. In this case, the set of vertices of G can be partially ordered. We can 
now state Simon’s result. 


Theorem 2.15. Let L be a language and M =(2,0,i,F,6) its minimal automaton. L 
is piecewise testable if and only if the following two conditions hold: 

1. The state transition graph G of is acyclic. 

2. For any subset I of X, each CC of Gr =(Q,6r) has a unique maximal state. 
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3. Algorithms 


In this section we will give a general outline of each of the algorithms implemented 
in the LANGAGE package, and supply a full description of our procedure for testing 
SLT languages. Maple affords sets, lists, and tables data types. Therefore the pseudo- 
code given for this last algorithm is very close to the Maple code. 

For time complexity analysis, we denote by n the number of states of the automaton, 
by s the size of the alphabet and by m the number of edges of the graph of the 
automaton. 


3.1. Algorithm for sLT languages 


Owing to the notion of pair-graph introduced by Kim et al. [6] we state a lemma 
which yields an efficient implementation of Theorem 2.5. 


Definition 3.1 (Pair-graph). Let W=(2,0,i,F,6) be an automaton. Let Q; and Q2 
be two subsets of Q. Let x be a symbol not in Q. The pair-graph on Q; x Q2 is the 
edge-labeled directed graph G(V,E), where V=(Q; U {*}) x (Q2U {}) — {(*, *)} and 
E is defined as follows. 

Define 6; : Q; x Y — Q; U {x}, i=1,2, such that for all p€ QO; and ac X, 


_)4 if (p,ay= qe Qi, 
amdi ERORO 


Then E = {((p, 9), 4, (r,s))| p E€ Q1,4 E Q2, p# q,r E Q1U{*},s E Q2U{*}, (r,s) Æ (*,*), 
ôı(p,a)=r, and 62(g,a)=s}. 


Lemma 3.2. Let C; and C; be two distinct SCCs of the state transition graph of an 

automaton M =(2,0,i,F,6). Let Q; and Q; be respectively the set of states in C; 

and C;. 

1. The component C; is s-local if and only if the pair-graph on Q; x Q; has no cycle. 

2. The components C; and C, are pairwise s-local iff the pair-graph on Q; x Q; has 
no cycle. 


sLT algorithm 

(1) Compute the SCCs of G. 

(2) Let |SCC| be the number of SCCs. 

(3) for i from1 to |SCC| do 

(4) for j from i to |SCC| do 

(5) Build the pair-graph Gi; on Q; x Qj. 


(6) Verify that G;; has no cycle. lemma 3.2 
(* otherwise exit, the language is not strictly locally testable *) 
(7) endfor 


(8) endfor 
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Let us study the complexity of this algorithm. SCCs construction (1) can be done 
in O(max(m,n))<O(n?). Suppose that G has r SCCs and that n; is the number of 
states of the SCC C;. Computation of (4) and (5) is achieved in O(sn;n;). Thus we 
can assert that the complexity of this algorithm is OQ; -1 X; snn j) <O(sn’). 


3.2. Algorithm for LT languages 


We have implemented the algorithm due to Kim et al. [6] whose time complexity 
is O(sn?). The following lemma is particularily useful. 


Lemma 3.3. Let C; be an s-local SCC of the state transition graph of a deterministic 
automaton M =(2,Q,i,F,6) and let Co be the reaching component of Cj. Let Qo 
and Q; be the sets of states in Co and C; and let Go; be the pair-graph on Qo x Qj. 
Then the state transition graph of M is not TS-local w.r.t. C; if and only if there 
is a path in Go; from an SCC to a node of the form (t,*) or (x,t). 


Let G be the state transition graph of the automaton M. Let Co be the component 
of the graph which consists of all the states from which C; is reachable. Let Qo (resp. 
Q;) be the set of states in Co (resp. C;). 


LT algorithm 


(1) Repeat 

(2) Choose an SCC C; of G without descendant. 
(3) Compute Co. 

(4) Compute the pair graph Go; on Qo x Qj. 


(* make use of lemma 3.3 in order 
to perform the following test *) 
(5) if G is TS-local w.r.t. C; then 


(6) G:=G-—C; (* delete C; *) 

(7) else 

(8) exit (* G is not locally testable*) 
(9) endif 


(10) until G has no SCC 


3.3. Algorithm for SLT languages 


First we will introduce two definitions and a new theorem from which we deduce 
our algorithm for testing whether or not an automaton recognizes a strongly locally 
testable language. 


Definition 3.4 (Product-graph of an SCC). Let M=(2,0Q,i,F,6) be an automaton. 
Let CCQ be an SCC of M. The product-graph of the SCC C is the directed graph 
G(V,E) where V ={(p,q)=(6(i,w), 0(,w))|reEC, ôlr,w)EC, we d*} and E= 
{(p,.9)4 (pg ))|aEX, o(p,a)= p', 6(g,a)=¢q'}. 
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Definition 3.5 (Attracting point). Let G=(X,U) be a directed graph. An attracting 
point is an SCC which has no descendant. 


Theorem 3.6. Let L be a locally testable language and let M=(2,0,i,F,6) be its 
minimal complete automaton. The following two properties are equivalent: 

1. For every SCC C of M, the product-graph of C has exactly one attracting point. 
2. For every SCC C of M, we have LoC Pc or Le NPc=9. 


Proof. We first prove 1 = 2 by showing that if Lc ¢ Pc and Le N Pc #0 then we 
have a word w € LcNPc. This word is the label of two paths leading to the state p € C 
(one from the initial state, another from a state of the SCC C). It comes that (p, p) 
is a state of an attracting point of the pair-graph. In a second time, we prove that for 
a word w’ € Lc and w’ ¢ Pc, we can build a state (pi,q1) in the pair-graph. There is 
no path from (~1,q1) to (p, p) so there is a second attracting point. The proof of the 
converse is in the full paper. O 


SLT algorithm is directly deduced from this theorem. The function product-graph 
computes the product-graph of a SCC C. It is based on a universal generation technique. 
Starting from each of the vertices (i,e) where e is a state in C, we produce the vertices 
of the product-graph according to the Definition 3.4. 


function product-graph(C, M) 
X- 9 
foreach e in C do 
X—-XU({(1,e)} 
endfor 
Xt — X 
while Xt #0 do 
take (e,e') in Xt 
foreach /etter in X do 
if ô(e', letter) € C then 
f — 0(e, letter) 
f! — 6(e’, letter) 
if (f, f’)¢X then 


XH XU SY} 
Xt — MULL AY} 
endif 
U — UU Ft DF 
endif 
endfor 


endwhile 
return (G=(X,U)) 
end 
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The function one-attracting-point lies on a transitive closure computing. If there 
exists a unique attracting point C, any state of C can be reached from any state of G. 
So we compute the transitive closure of each state of G and verify that the intersection 
is not empty. 


function one-attracting-point(G = (X, U)) 
Inter— X 
foreach x in X do 
transitive[x] — 0 
endfor 
foreach (x, y) in U do 
transitive[x] — transitive[x] U {y} 
endfor 
foreach x such that fransitive[x] 40 do 
K < transitive[x] 
while K #0 do 
Tm <— transitive[x] 
foreach y in K such that (y,z)€ U do 
transitive[x] — transitive[x] U {z} 


endfor 
K < transitive[x]\Tm 
endwhile 
Inter— InterNtransitive|[x] 
endfor 
return(/nter #4 0) 


We can now state the algorithm for testing whether or not an automaton recognizes 
a strongly locally testable language. 


SLT algorithm 

(1) Check if M is locally testable 
(2) Compute the SCCs Ci of M 
(3) for i from 1 to |SCC| do 


(4) G <—product_graph(.Z, Ci) 

(5) if one-attracting-point(G) = false then 
(6) return(false) 

(7) endif 

(8) endfor 


(9) return(true) 


The complexity of local testability test is in O(sn?). We test the conditions (4) 
and (5) in O; s|Q| x |Q;|)<O(sn’). Hence the complexity of SLT algorithm is 
O(sn). 
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3.4. Algorithm for PT languages 


This algorithm is described by Stern in [11] and has a O(sn>) time complexity. To 
each state q is associated the set 


2(q)={aE Z| 6(q,a)=q}. 


If q and q’ are distinct maximal states of C then they are also distinct maximal states 
of some component of G=(Q,dr) where T= X(q)M X(q'), hence condition (2) of 
Theorem 2.15 can be restricted to the subsets I of the form 2(q)/M 2(q’). 


Proposition 3.7. Let G be the state transition graph of a minimal deterministic au- 
tomaton M =(X,Q,i,F,ô). If G is acyclic then q€Q is a maximal state of a com- 
ponent C of Gr=(OQ,6r) iff 

1. qEC. 

2. TE X(q). 


PT algorithm 

(1) Find all SCCs of M. 

(2) for each SCC C; do 

(3) verify that C; is acyclic 
(4) endfor 

(5) for q in Q do 

(6) compute X(q) 

(7) endfor 

(8) for q in Q do 

(9) for q in Q do 


(10) if q#q' then 

(11) compute G' =(Q,6r) where F =X(q)N X(q') 
(12) if T #0 then 

(13) compute G” =(Q,U) the transitive closure of G' 
(14) for p in Q do 

(15) if (p,q) €U and (p,q’)€ U then 

(16) exit (* L is not piecewise testable *) 
(17) endif 

(18) endfor 

(19) endif 

(20) endif 


(21) endfor 
(22) endfor (* L is piecewise testable*) 
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4. Conclusion 


LANGAGE is the third package of AGL, the new version of AG [5]. An algorithm 
of conversion from a “Glushkov” automaton to the language it recognizes is also 
implemented in LANGAGE. The whole package is to be interfaced on the World Wide 
Web. In a first time, this interface will allow users to input regular expressions. Next, 
graphical inputs will be possible. Some other tests of languages are already investigated. 
It is the case of the threshold locally testable languages which are introduced in [2]. 
This package as well as AG is available via anonymous ftp at ftp.dir.univ-rouen.fr in 
the directory pub/MAPLE/AG. 
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